# Performance Comparison of Rns Modular Adder Based On Existing Parallel Prefix Trees And Random Number Generator Design 

${ }^{1}$ Anjitha Purushothaman, ${ }^{2}$ Divya S.<br>${ }^{1,2}$ Electronics and Communication Engineering Department, Sree Narayana Gurukulam College of Engineering, Kadayiruppu, Kerala, India


#### Abstract

Residue number system is widely used in application like cryptography, numeral analysis and signal processing. Random number generators are extensively used in cryptography and its design is based on traditional linear feedback shift registers. Residue number system are very suitable for the implementation of fast VLSI system. Modular adder is the basic component in RNS system. Suitable parallel prefix trees are used for the modular adder design. In this paper a performance comparison of modular adder of moduli set $2^{n}$ -$2^{k}-1$ is made and a random number generator based on this adder is proposed to generate random numbers with good randomness properties desirable for cryptographic applications. Moduli set with the form of $2^{n}-$ $2^{k}-1(1 \leq k \leq n-2)$ is best suitable for multichannel RNS processing. The proposed model offers excellent randomness property which is the basic requirement for network security and also offers better area and delay performance.


Keywords: Residue number system, parallel prefix adder,modular adder,carry correction, FPGA

## I. Introduction

The demand for new techniques in network security is increasing with the growth of network services in our world. Cryptography plays an important role in network security and it is a vital tool that provides security against various external and internal threats in the network. Data confidentiality is mainly achieved by means of cryptography. The aim of cryptographic techniques is to secure the information so that only the intended parties can read. For transmitting audio and video signals for cable TV, commercial and sensitive data and video conferencing the speed of the cryptographic module is required to be high. However the traditional software implementation of cryptographic algorithms are not efficient in real time applications.


Fig 1 Cryptographic System
The random number generator is a vital cryptographic module widely used for key generation and authentication protocols. The security of such systems completely relies on the excellent randomness property provided by the generators. Thereby future sequence pattern in the random number sequence cannot be predicted by the observed sequence. The random number generators broadly categorized into true random number generator and pseudo random number generators. The design of cryptographically secure random number generator is extremely difficult. Linear feedback shift register is the traditional method for designing and generating random numbers which uses shift registers. This method has good statistical properties and leads to very efficient hardware implementation. Modular adders can be used to design LFSR for generating random numbers that can offer good randomness property.

Modular adders are the most prime component of residue number system. RNS is an ancient numerical representation system. It is a non weighted numerical representation system and have carry free property in
addition and multiplication operations. Modular adder is the key module for RNS based DSP systems. Moduli set with the form of $2^{n}-2^{k}-1$ can offer excellent balance among the RNS channels for multichannel RNS processing. This modular adder can be used to design LFSR based random number generator with good randomness property.

This paper is organized as follows: Section II is a review of the related work used.RNS and modular adder arithmetics are discussed in Section III. An introduction of Parallel prefix trees is presented in Section IV. Section V presents the proposed design and Section VI gives the performance comparison. Conclusion and future work is given in Section VII.

## II. Related Work

First a brief survey on modular adders were made and discussed the designing techniques of random number generators based on modular adders. Modular adders can be classified into two types: the general modular adder and the special modular adder and it is based on the form of modulus. In the former adder design the two values $\mathrm{A}+\mathrm{B}$ and $\mathrm{A}+\mathrm{B}+\mathrm{T}$ should be computed first and one of them is selected as the final output. Bayoumi and Miller [2] proposed a general modular adder for arbitrary modulus by using 2 cascaded binary adders and its delay is the sum of two binary adders. Several modular adders with two binary adders to calculate $\mathrm{A}+\mathrm{B}$ and $\mathrm{A}+\mathrm{B}+\mathrm{T}$ were proposed subsequently. However this approach offers better delay performance the area consumption is relatively larger and it is twice the binary adder. Reused binary adder configuration [3] is the another type of general modular adder design and was proposed by Dugdale. The drawback of this type of adder is that it will use two operation cycles to perform one modular addition. Subsequently many studies on modular adders were done that have better area and delay performance. A high speed and reduced area modular adder structure for RNS were proposed by Hiasat[6] where any regular carry lookahead based binary adder can be used in the final stage. This structure needs an extra CLA unit to get the carry out bit of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ before the final CLA addition and as a result delay is not reduced significantly. ELMMA [9]algorithm is another popular modular adder design proposed by R.A. Patel. In this adder two carry computation modules for $\mathrm{A}+\mathrm{B}$ and $\mathrm{A}+\mathrm{B}+\mathrm{T}$ were used and some carry computation units were shared. But in the worst cases almost two independent carry generation modules were used. Dimitris Bakalis [10]proposed fast parallel prefix adder for modulo $2^{n}+1$ adders. This architecture is based on parallel prefix carry computation units.

The complexity of special modular adder is much less than that of general modular adder since optimization is possible in special modular adder. The optimization is done according to the modulus. Several architecture for modulo $2^{n}+1$ and $2^{n}-1$ adder were proposed based on parallel prefix and carry correction.[10][19][20]. Piestrak [4] made a comprehensive study of residue generators and multioperand modular adders. He proposed a design using carry save adder with end around carry and are well suited for VLSI implementation, R.A. Patel [13] first proposed a literature on modulo $2^{n}-\left(2^{n-2}+1\right)$ addition based on carry offset where the carry information of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ is only required to calculate. The carry information for $\mathrm{A}+\mathrm{B}$ is obtained by modifying the carries of $\mathrm{A}+\mathrm{B}+\mathrm{T}$. Even if all the redundant modules of carry computation are eliminated, the structure of carry computation is fixed and can only perform the special modular addition of modulo $2^{n}-\left(2^{n-2}+1\right)$. In most of the RNS based application , addition and multiplication intensive systems are used and the main issue is the selection of moduli set accordingly. For such systems residue channels are always expected as many as possible where dynamic range is fixed ie. the wordlength of the residue channels can be reduced inorder to achieve better speed performance. Width of the each channel is also expected as close as possible to get similar critical path delay and thereby fine balance is achieved between each residue channels. The modular adders discussed yet are high performance adders but are not always suitable to construct multichannel RNS that offers fine channel balance ie. It is haed to construct a multichannel that have fine balance with moduli set $2^{n}+1$ and $2^{n}-1$. Recently [1] Shang Ma and Jian Ho proposed a modular adder particularly applicable for RNS systems with modulus of the form $2^{n}-2^{k}-1 \quad(1 \leq k \leq n-2)$. These adder have outstanding performance in constructing multichannel moduli set with fine balance. L.Li, J Hu and Y Chan recently proposed a general architecture for $2^{n}-2^{k}-1$ multiplier. However there were only little discussion and recently a detailed discussion was made [1].

Random number generators are the most vital component used in network security systems like cryptography and encryption techniques. Cryptography is mainly concerned with confidentiality where a message is converted from comprehensible form to incomprehensible form rendering it unreadable by interceptors and eavesdroppers. RNG[7] are basically classified as true and pseudo generators. A common method of producing a pseudo random number generators is to use the output of a linear feedback shift register.

Almost all PRNG patterns are reputable and predictable for small cycles. In this paper a new design for random number generator based on modular adder is proposed. This design uses a modular adder $2^{n}-2^{k}-1$ which has better area and delay performance. This adder is best suitable for RNS multichannel since a class of modulo is designed instead of a single moduli based on different values of k . Moreover when LFSR design based on $2^{n}-2^{k}-1$ adder and conventional modular adder are compared the proposed gives better area and delay performance. Since randomness is the most important requirement in cryptography it is very necessary to design such generator that have good randomness property. This proposed random number generator based on LFSR offers excellent randomness property.

In the rest of the paper a brief introduction of RNS and modular addition are presented in Section II. And Section III introduces the hardware architecture of modulo $2^{n}-2^{k}-1$ adder. Section IV describes the proposed random number generator design. Performance of different modular adders and LFSR are evaluated and compared in Section V.

## III. Rns And Modular Arithmetics.

RNS arithmetic seems especially suitable for DSP hardware as rapid computation using simple operation of addition subtraction and multiplication can be performed which the basic arithmetic operations of DSP algorithms. RNS arithmetic also has the desirable properties for VLSI implementation of concurrency , modularity and fault tolerant capability.

Residue number system consists of N pairwise relatively prime moduli. A number X is represented as $\left(|X|_{m 1},|X|_{m 2} \ldots \ldots .|X|_{m N}\right)$ where $|X|_{m} \in[0, m-1], \mathrm{N}>1, \operatorname{GCD}\left(m_{i}, m_{j}\right)=1, \mathrm{i}, \mathrm{j}=1,2, \ldots \mathrm{~N}$ and GCD is the greatest common divisor of $m_{i}$ and $m_{j}$. Let A and B be two integers represented by N -tuple word $\left\{a_{R N S}^{0}, a_{R N S}^{1} \ldots \ldots . a_{R N S}^{N-1}\right\}$ and $\left\{b_{R N S}^{0}, b_{R N S}^{1}, \ldots \ldots . b_{R N S}^{N-1}\right\}$ respectively in residue number systems. Let $\diamond$ denote the binary operation of addition ,subtraction and multiplication. Then $\mathrm{C}=\mathrm{A} \vee \mathrm{B}$ is isomorphic to $C=\left\{c_{R N S}^{0}, c_{R N S}^{1}, \ldots \ldots . . c_{R N S}^{N-1}\right\}$ where $c_{R N S}^{i}=\left|a_{R N S}^{i} \diamond b_{R N S}^{i}\right|$ and $\mathrm{i} \in[0, N-1] . c_{R N S}^{i}$ is solely dependent on $a_{R N S}^{i}$ and $b_{R N S}^{i}$, this results in fast, parallel, independent processing within each of the N residue channels. The modulo m addition for integers A and B in the range of $[0, \mathrm{~m}]$ is defined as

$$
\mathrm{C}=\langle A+B\rangle_{m}=\left\{\begin{array}{c}
A+B A+B<m  \tag{1}\\
A+B-m A+B \geq m
\end{array}\right.
$$

If $\mathrm{C}=\langle A+B\rangle_{m}$ and the bit width of the modular adder is n bit where $\mathrm{n}=\left[\log _{2} m\right]$ ie n is the smallest integer no less than $\log _{2} m$. Then eqn (1) can be represented as

$$
\mathrm{C}=\left\{\begin{array}{c}
A+B A+B+T<2^{n}  \tag{2}\\
\langle A+B+T\rangle_{2^{n}} A+B+T \geq 2^{n}
\end{array}\right.
$$

Where the correction factor $T=2^{n}-m$.that is if the carry out bit of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ is ' 1 ' then the result of the modular addition is the least significant bits of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ otherwise the result is $\mathrm{A}+\mathrm{B}$.

## A. Parallel Prefix Addition

The key element in fast addition of two n-bit operands X and Y is in the reduction of the latency in the carry network. Carry computation can be considered as a prefix problem. This method is widely adopted in binary adder design where each sum bit $s_{i}$ and carry bit $c_{i}$ can be calculated with the previous carries and inputs. Prefix based binary adder can be divided into 3 units, the preprocessing unit, prefix computation and sum computation unit.In the preprocessing unit , prefix computation is calculated as

$$
\left(g_{i}, p_{i}\right)=\left(a_{i} b_{i}, a_{i} \oplus b_{i}\right)
$$

where $g_{i}$ and $p_{i}$ represents the $i^{t h}$ carry generation and carry propagation bits respectively. The prefix computation unit is used to compute the carry information used in the sum computation unit. For carry computation group generate and group propagate bits are obtained from $g_{i}$ and $p_{i}$ respectively.


Fig 2. Prefix adder.

$$
\left\{\begin{align*}
& \left(G_{i: i}^{0}, P_{i: i}^{0}\right)=\left(g_{i}, p_{i}\right)  \tag{4}\\
\left(G_{i: k}^{l}, P_{i: k}^{l}\right) & =\left(G_{i: j+1}^{l-1}, P_{i: j+1}^{l-1}\right) \bullet\left(G_{j: k}^{l-1}, P_{j: k}^{l-1}\right) \\
& =\left(G_{i: j+1}^{l-1}+P_{i: j+1}^{l-1} G_{j: k}^{l-1}, P_{i: j+1}^{l-1} P_{j: k}^{l-1}\right)
\end{align*}\right.
$$

Where $\mathrm{i}=0,1, \ldots \mathrm{n}-1,0 \leq k \leq j \leq l l=1,2, \ldots . m$ and 1 represents the $l^{\text {th }}$ stage. There are several well known binary prefix addition structures such as Sklansky, Brent Kung, Kogge Stone, Han Carlson. There structures are usually called prefix trees. After prefix computation carries are obtained $c_{i,} \mathrm{i}=0,1,2 \ldots . \mathrm{n}$ for $i^{t h}$ bit and computed as
$\left\{\begin{array}{c}c_{0}=c_{\text {in }} \\ c_{i}=G_{i-1: 0}^{l}+P_{i-1: 0}^{l} c_{\text {in }} \\ c_{\text {out }}=c_{n}\end{array}\right.$
In the sum computation unit the carries $c_{i}$, from prefix computation unit and partial sum $p_{i}$ from the preprocessing unit are used together to compute the final sum $s_{i}$.
$s_{i}=p_{i} \oplus c_{i} i=0,1, \ldots . n-1$

## IV. Parallel Prefix Trees

Parallel prefix structures are widely used in high performance adders as the delay is logarithmically proportional to the adder width. Parallel prefix adders basically consists of three stages and are pre processing, prefix computation and final sum computation. The parallel prefix trees are used in the prefix computation stage where group generate/propagate signals are computed at each bit. Any existing prefix structures can be used to get the carries. In this work different parallel prefix trees areused in the adder of modulo $2^{n}-2^{k}-1$ and their performance are analysed. Sklansky, Kogge Stone, Brent Kung, Han Carlson are the common prefix trees that are used.

## A. Sklansky

Sklansky prefix tree is the simplest among the prefix trees. Fig 3 shows the Sklansky prefix tree for 16 bit operand. It has the least logic levels to compute carries. It uses less cells than Knowles and Kogge Stone prefix structure. But this tree have higher fanout. The fanout of such adders grows exponentially from input to output. This leads to a large delay as tHe adder operands's width increases. It is the compacted version of Brent Kung where logic level is reduced and fanout is increased.


Fig 3 Sklansky Prefix Tree

## B Kogge Stone

This tree has a regular layout. It is widely considered in the fastest adder design and is used for high performance adders. Fig 4 shows the Kogge Stone prefix tree for 16 bit operand It takes more area to implement than the Brent Kung, but has a lower fanout at each stage and this increases the performance. Wiring congestion is often a problem for Kogge Stone adder as well


Fig 4 Kogge Stone Prefix Tree

### 4.3 Brent Kung

Brent Kung trees have large number of levels and so it reduces its operational speed. Fig 5 shows the Brent Kung prefix tree for 16 bit operand It have the lowest area and delay with large number of inputs and hence it is power efficient. The number of cells required is less than the Kogge Stone therefore it takes less area to implement. However this adder is not the best for very fast addition. Brent Kung prefix tree is a bit complex to build because it has the most logice levels.


Fig 5 Brent Kung Prefix Tree

### 4.4 Han Carlson

The idea of Han Carlson prefix tree is very similar to Kogge Stone as it has a maximum fanout of 2. And it uses less cells and wire tracks than the Kogge Stone. Fig 6 shows the Han Carlson prefix tree for 16 bit operand. It requires extra logic level. It is viewed as a sparse version of Kogge Stone prefix tree.


Fig 6 Han Carlson Prefix Tree

## V. Novel $2^{n}-2^{k}-1$ Adder

The adder structure used for the design of random number generator is shown in fig.... it is of modulus $2^{n}-2^{k}-1$ and composed of four units, preprocessing unit, carry generation, carry correction and sum computation unit. Generally this adder structure can be divided into two general binary adders A1 and A2 as shown in fig 7 with carry correction and sum computation unit. This is based on the characteristics of correction T for modulus $2^{n}-2^{k}-1$. any existing prefix structures can be used to compute the carries of $\mathrm{A}+\mathrm{B}+\mathrm{T}, C_{i}^{T}$. and by correcting the carries $C_{i}^{T}$ we can obtain the final carries $C_{i}^{\text {real }}$ used in the final stage. Thus final modular addition result is obtained from $C_{i}^{\text {real }}$ and partial sum information from the preprocessing units. The main interesting feature of thisarchitecture is that it avoids the calculation of carry information for $\mathrm{A}+\mathrm{B}+\mathrm{Tand}$ A+Bseparately. Thereby area and delay can be reduced significantly and offers flexible tradeoff between area and delay.

## A.Pre processing unit

This unit computes carry generation and carry computation bits for every bit $\mathrm{I}, i \in[0 . n-1]$. When modulus $m=2^{n}-2^{k}-1$,then the correction factor is given as $T=2^{\left[\log _{2} 2^{n}-2^{k}-1\right]}-m .=2^{k}+1$. The binary representation of T is $00 \ldots 00100 \ldots . .001$. the computation of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ can be performed by A 1 and A 2 where A 1 and A2 are used for lower-k bits and higher n-k bits addition respectively. Let $T_{A 1}=00 \ldots 001$, $T_{A 2}=00 \ldots 001$ and the binary representation of A and B are $a_{n-1} \ldots . a_{k-1} a_{k} \ldots a_{1} a_{0}$ and $b_{n-1} \ldots . b_{k-1} b_{k} \ldots . b_{1} b_{0}$ respectively.


Fig 7. $2^{n}-2^{k}-1$ Adder structure
The operation of adder A1 and A2 can be given as
$\left\{\begin{array}{c}S_{A 1}=a_{k-1} \ldots . a_{0}+b_{k-1} \ldots . b_{0}+T_{A 1} \\ S_{A 2}=a_{n-1} \ldots . a_{k}+b_{n-1} \ldots . b_{k}+T_{A 2}+c\end{array}\right.$
Where $C_{A 1}$ is the carry out bit of adder A1. This LSB bits of $T_{A 1}$ is ' 1 ' and all others are ' 0 '. This A1 can be treated as a k-bit adder with lowest carry in bit since $T_{A 1}$ is one of the input ofA1. Since the LSB bit of $T_{A 1}$ is ' 1 ' it is considered for the carry generation and carry propagation bits and are computed as
$\left\{\begin{array}{cc}\left(g_{0}, p_{0}\right)=\left(a_{0}+b_{0}, \overline{a_{0} \oplus b_{0}}\right) & i=0 \\ \left(g_{i}, p_{i}\right)=\left(a_{i} b_{i}, a_{i} \oplus b_{i}\right) & i=1,2, \ldots . k-1\end{array}\right.$

The second adder A2 adds the constant $T_{A 2}$ and the carry out bit $C_{A 1}$ from adder A1. So it can be regarded as a 3 -input adder with lowest carry in bit. The 3 -inputs are $a_{n-1} \ldots . a_{k}, b_{n-1} \ldots b_{k}$ and $T_{A 2}$. In this design the number of inputs is reduced from three to two for adder A2 by using Simple Carry Save adder (SCSA). The first stage of SCSA computes $\left(g_{i}^{\prime}, p_{i}^{\prime}\right)$ for $i=k, k+1, \ldots n-1$
$\left(g_{i}^{\prime}, p_{i}^{\prime}\right)=\left(a_{i} b_{i}, a_{i} \oplus b_{i}\right)$ (9)
This $g_{i}^{\prime} p_{i}^{\prime}$ are treated as the inputs of the second stage in SCSA. The carry generation and carry propagation bits for $i=k, k+1, \ldots n-1$ are obtained from this second stage from $\left(g_{i}^{\prime}, p_{i}^{\prime}\right)$ and $T_{A 2}$. Thus the final output for preprocessing unit are
$\left\{\begin{array}{c}\left(g_{k}, p_{k}\right)=\left(p_{k}^{\prime}, g_{k}^{\prime}\right) i=k \\ \left(g_{i}, p_{i}\right)=\left(p_{i}^{\prime} g_{i-1}^{\prime}, p_{i}^{\prime} \oplus g_{i-1}^{\prime}\right) i=k+1, \ldots, n-1\end{array}\right.$
The carry out bit of SCSA, $C_{S C S A}$ is required to compute the carry out bit of $\mathrm{A}+\mathrm{B}+\mathrm{T}, C_{\text {out }}$. It is calculated as
$C_{S C S A}=a_{n-1} b_{n-1}=g_{n-1}^{\prime}$ (11)

## B.Carry Generation Unit

This unit uses any existing prefix structure to compute the carries $C_{i}^{T}$ of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ from carry generation and carry propagation bits of pre processing unit. The carry out bit of SCSA is not involved in the prefix computation. $C_{S C S A}$ is combined with the carry out bit of prefix tree and determines the carry out bit of $\mathrm{A}+\mathrm{B}+\mathrm{T}$, $C_{\text {out }}$.

$$
\begin{aligned}
& c_{\text {out }}=c_{S C S A}+c_{n}^{T}=c_{S C S A}+G_{n-1: 0} \\
& \quad=c_{S C S A}+G_{n-1: l}+P_{n-1: l} G_{n-1: l}
\end{aligned}
$$

$=c_{S C S A}+G_{n-1: l}+P_{n-1: l} c_{l}^{T}(12)$

## C. Carry correction unit

The final carries $C_{i}^{\text {real }}$ used in the final sum computation stage for each bit is obtained from the unit. In this design the carries for $\mathrm{A}+\mathrm{B}$ is $C_{i}^{\text {real }}$ is obtained by correcting the carries $C_{i}^{T}$ of $\mathrm{A}+\mathrm{B}+\mathrm{T}$. Hence area is reduced.The relation between $C_{i}^{0}$ and $C_{i}^{1}(i=0,1, \ldots n-1)$ is derived where $C_{i}^{0}$ and $C_{i}^{1}$ are the carry outputs of prefix trees when the lowest carry in is ' 0 ' and ' 1 ' respectively. The relationship is given as
$C_{i+1}^{0}=\overline{P_{i: 0}} C_{i+1}^{1}(i=0,1, \ldots n-1)(13)$
Where $P_{i: 0}=p_{i} p_{i-1} \ldots \ldots . p_{0}$ and $p_{i}=a_{i} \oplus b_{i}$. This means that $C_{i}^{0}$ can be determined from $C_{i}^{1}$ by simple logic operation. This is the foundation of carry correction for this modular adder. The carry bit of $\mathrm{A}+\mathrm{B}$ can be obtained with twice carry correction of $\mathrm{A}+\mathrm{B}+\mathrm{T}$. Whether carry correction is performed or not depends on the carry our bit of $\mathrm{A}+\mathrm{B}+\mathrm{T}, C_{\text {out }}$.
Therefore the carry bits required by the modular adder are given as

$$
C_{i}^{\text {real }}= \begin{cases}c_{i+1}^{T}\left(c_{\text {out }}+\overline{P_{i: 0}}\right) & i=0,1, \ldots, k-1  \tag{14}\\ c_{i+1}^{T}\left(c_{\text {out }}+\overline{P_{k-1: 0}}\left(p_{k} \oplus c_{k}^{T}\right)\right) & i=k \\ c_{i+1}^{T}\left(c_{\text {out }}+\overline{P_{i: k+1}}+\overline{P_{k-1: 0}}\left(p_{k} \oplus c_{k}^{T}\right)\right) & i=k+1, \ldots, n-2\end{cases}
$$

## D . Sum computation unit

This unit computes the final result for modular adder. It is same as that in prefix based binary adder. $C_{i}^{\text {real }}$ is used for final computation of sum with respect to $C_{\text {out }}$. If $C_{o u t}=0, C_{i}^{\text {real }}$ is the carry bit of $\mathrm{A}+\mathrm{B}$, otherwise it is the carry bit of $A+B+T$. the partial sum bits of $A+B$ and $A+B+T$ are both required in the final sum computation.Let $p_{i}^{0}$ and $p_{i}^{1}$ be the partial sum of $\mathrm{A}+\mathrm{B}$ and $\mathrm{A}+\mathrm{B}+\mathrm{T}$ respectively

$$
\left\{\begin{align*}
p_{0}^{0} & =\overline{p_{0}}, p_{0}^{1}=p_{0} i=0  \tag{15}\\
p_{k}^{0} & =\overline{p_{k}}, p_{k}^{1}=p_{k} i=k \\
p_{i}^{0}=p_{i}^{1}=p_{i} i & =1, \ldots, k-1, k+1, \ldots, n-1
\end{align*}\right.
$$

Hence

$$
\left.\begin{array}{rl} 
& s_{0}=\overline{c_{\text {out }}} p_{0}^{0}+c_{\text {out }} p_{0}^{1}=\overline{c_{\text {out }}} p_{0}+c_{\text {out }} p_{0}=c_{\text {out }} \oplus \overline{p_{0}} \\
& s_{k}^{\text {real }} \oplus c_{k}=c_{k}^{\text {real }} \oplus\left(\overline{c_{\text {out }}} p_{k}^{0}+c_{\text {out }} p_{k}^{1}\right)=c_{k}^{\text {real }} \oplus\left(\overline { p _ { k } } \left(\overline{\text { out }} p_{k}\right.\right.
\end{array}+c_{\text {out }} p_{k}\right) .
$$

When $i=1, \ldots ., k-1, k+1, \ldots, n-1$

$$
s_{i}=c_{i}^{\text {real }} \oplus p_{i}
$$

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834, p-ISSN: 2278-8735.
PP 97-107
www.iosrjournals.org

Therefore the sum bits for all i are,

$$
s_{i}=\left\{\begin{array}{c}
c_{\text {out }} \oplus \overline{p_{0}} i=0  \tag{18}\\
c_{k}^{\text {real }} \oplus c_{\text {out }} \oplus \overline{p_{k}} i=k \\
c_{i}^{\text {real }} \oplus p_{i} i=1, \ldots, k-1, k+1, \ldots, n-1
\end{array}\right.
$$

This $C_{\text {out }} \oplus \overline{p_{k}}$ and $C_{i}^{\text {real }}$ can be obtained at the same time. Therefore there is no extra delay


Fig 8 modulo $2^{8}-2^{4}-1$ adder

## VI. Proposed System

The data confidentiality in secured communication system is achieved by means of cryptography. This security is maintained by keeping the secret key used for data encryption and decryption confidential. The secret keys should be extremely strong enough so that attackers and eavesdroppers could not predict out and break the cipher text and misuse it.. Therefore we require strong keys. The keys are usually generated by simple random number generators. And the random numbers generated must have excellent randomness properties. Fig 4 shows a conventional random number generator based on linear feedback shift register.


Fig 9. Conventional LFSR
The generator is uses XOR based feedback. The input of shift register is the linear function of previous states. Fig shows the proposed design for random number generator that have excellent randomness properties. In this design the XOR based feedback is replaced by modular adder.


Fig 10 Proposed random number generator.

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834, p-ISSN: 2278-8735.
PP 97-107
www.iosrjournals.org

Whenever the sum exceeds the modulus, the adder produces an exactly different result as sum. By keeping the moduli sets used in the design of modular adders confidential we can produce extremely strong cipher texts rendering it unreadable by interceptors and eavesdroppers. The moduli set $2^{n}-2^{k}-1$ is very suitable in constructing balanced multichannel with fixed dynamic range and similar critical path delay. So by employing $2^{n}-2^{k}-1$ modular adder we could get generator with excellent randomness property, large dynamic range and better performance which is very suitable for cryptography application.

## 6. Fpga Implementation And Performance Comparison

## A. FPGA Implementation

To understand the effectiveness of the proposed design, it is implemented on FPGA of device family SPARTAN 3E. this id synthesized in Xilinx 13.3 version. The simulation result for $2^{n}-2^{k}-1$ modular adder and random number generator are shown. Table 1 shows the device utilization summary and timing report for various adder and the proposed adder. Fig 6 and Fig 7 shows the simulation waveform of modular adder and random number generator.

Table. 1. Logic utilization and Timing report

| Modular adders | No of Slices <br> (Available 4656) | No of 4 i/p LUTs <br> (Available 9312) | Delay (ns) |
| :--- | :--- | :--- | :--- |
| Bayoumi [2] | 17 | 29 | 14.687 |
| Dugdale [3] | 25 | 41 | 33.735. |
| $2^{\wedge(n)-2^{\wedge}(\mathrm{n}-2)-1[13]}$ | 12 | 21 | 11.831. |
| $2^{\wedge} \mathrm{n}-2^{\wedge} \mathrm{k}-1$ | 19 | 33 | 14.878 |

Area and delay of the $2^{n}-2^{k}-1$ adder influence the area and delay of proposed LFSR design. So it is required to reduce the factors of adder so that we can improve the performance of LFSR correspondingly. In [2] where two binary adders are used to get $\mathrm{A}+\mathrm{B}$ and $\mathrm{A}+\mathrm{B}+\mathrm{T}$ simultaneously. Since two adders are used the area requirement is much greater than other adders. In [3] where single binary adder is used to compute the result but requires twice the clock cyles. Delay is much increased in this scheme. An another scheme of modulo $2^{n}-2^{n-2}-1$ which is a special case of our adder has relatively small delay and give better area delay performance. The adder $2^{n}-2^{k}-1$ adder offers a difference set of moduli for difference values of ' $k$ '. That is it is a general design architecture for different modulus. Therefore some optimizations and extra design are applied for such purpose making it suitable for multichannel RNS application. However even if these optimizations are done it still offers better area and delay performance compared to [2][3] [13]. Thus by employing this adder fastest and large dynamic range LFSR can be obtained with excellent randomness property.

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834, p-ISSN: 2278-8735.
PP 97-107
www.iosrjournals.org


Fig 11 . Simulation of $2^{8}-2^{4}-1$ modular adder.


Fig 12 . Simulation of proposed random number generator.
Table II. Logic utilization and Timing report for $2^{n}-2^{k}-1$ modular adder with different parallel prefix trees.

| Parallel Prefix <br> Trees | Sklansky | Kogge Stone | Brent Kung | Han Carlson | Harris |
| :---: | :---: | :---: | :---: | :---: | :---: |
| No of Slices | 19 | 19 | 18 | 22 | 22 |
| No of 4 I/P LUTs | 33 | 34 | 33 | 38 | 38 |
| No of Bonded <br> IOBs | 24 | 24 | 24 | 24 | 24 |
| Delay(ns) | 14.878 | 14.637 | 14.667 | 14.684 | 14.684 |

## VII. Conclusion

In this paper a new design approach for random number generator using modular adder is proposed. The proposed design consists of shift registers and modular adders. And modular adder is constructed of four units, preprocessing, carry computation, carry correction and sum computation unit. An analysis of performance of modular adders with different parallel prefix trees are also made. On comparison the Sklansky prefix tree offers the best area and delay performance with least number of logic levels. Since the modular adder use twice
carry corrections instead of carry computation improved the area and timing in VLSI implementation and reduces the redundant units of parallel computation of $\mathrm{A}+\mathrm{B}+\mathrm{T}$ and $\mathrm{A}+\mathrm{B}$ in the traditional adder. Hence comparison shows the LFSR designed using $2^{n}-2^{k}-1$ offer better area and delay performance when compared with traditional adders. The modulus with the form of $2^{n}-2^{k}-1$ facilitates the construction of RNS channels with large dynamic and more balanced complexity among each residue channels.

## References

[1] Shang Ma, Jian Hao Hu and Chen Hao, "A novel modulo $2^{n}-2^{k}-1$ adder for residue number system" IEEE Transactions On Circuits And Systems-I: Regular Papers. vol. 60, no. 11, pp. 2962-2972, May. 2013
[2] M. Bayoumi, G. Jullien, and W. Miller, "A VLSI implementation of residue adders," IEEE Trans. Circuits Syst., vol. CAS-34, no. 3, pp. 284-288, Mar. 1987.
[3] M. Dugdale, "VLSI implementation of residue adders based on binary adders," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 39, no. 5, pp. 325-329, May 1992.
[4] S. J. Piestrak, "Design of residue generators and multioperand modular adders using carry-save adders," IEEE Trans. Comput,, vol. 43, no. 1, pp. 68-77, Jan. 1994.
[5] A. A. Hiasat, "High-speed and reduced-area modular adder structures for RNS," IEEE Trans. Comput,, vol. 51, no. 1, pp. 84-89, Jan. 2002.
[6] Cerda J.C., Martinez C.C., Corner J.M. and Hoe, "An Efficient FPGA Random Number Generator Using LFSR and Cellular Automata", IEEE Trans. Circuits \& Systems,pp.912-915, Aug 2012.
[7] Erkek E and Tuncer T., "The implementation of ASG and SG Random Number Generator", IEEE International Conference on Science and Engineering,pp 363-367,July 2013.
[8] Liang W. and Long Jing, "A Cryptographic Algorithm Based on Linear Feedback Shift Register", IEEE International Conf on Computer Applications and Systems Modeing, vol. 15, pp 526-529, Oct 2010.
[9] R. A. Patel, M. Benaissa, N. Powell, and S. Boussakta, "ELMMA: Anew low power high-speed adder for RNS," in Proc. IEEE Workshopon Signal Processing Systems, Oct. 2004, pp. 95-100.
[10] E. Vassalos, D. Bakalis, and H. T. Vergos, "Modulo $2^{\text {n }}+1$ arithmetic units with embedded diminished-to-normal conversion," in Proc. $14^{\text {th }}$ Euromicro Conf. Digital System Design (DSD), 2011, pp. 468-475
[11] R. A. Patel and S. Boussakta, "Fast parallel-prefix architectures for modulo $2^{n}-1$ addition with a single representation of zero," IEEE Trans. Comput., vol. 56, no. 11, pp. 1484-1492, Nov. 2007.
[12] S. H. Lin and M. H. Sheu, "VLSI design of diminished-one modulo $2^{n}+1$ adder using circular carry selection," IEEE Trans. Circuits Syst.II, Exp. Briefs, vol. 55, no. 9, pp. 897-901, Sep. 2008.
[13] R. A. Patel, M. Benaissa, and S. Boussakta, "Fast modulo $2^{n}-\left(2^{n-2}+1\right)$ addition: A new class of adder for RNS," IEEE Trans. Comput., vol. 56, no. 4, pp. 572-576, Apr. 2007.
[14] R. A. Patel, M. Benaissa, N. Powell, and S. Boussakta, "Novel power-delay-area-efficient approach to generic modular addition," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp. 1279-1292, Jun. 2007.
[15] P. M. Matutino, R. Chaves, and L. Sousa, "Arithmetic units for RNS moduli $2^{n}-3$ and $2^{n}+3$ operations," in Proc. 13th Euromicro Conf. Digital System Design: Architecture, Methods and Tools (DSD), 2010, pp. 243-246.
[16] E. Vassalos, D. Bakalis, and H. T. Vergos, "Modulo arithmetic units with embedded diminished-to-normal conversion," in Proc. $14^{\text {th }}$ Euromicro Conf. Digital System Design (DSD), 2011, pp. 468-475.
[17] P. Patronik, K. Berezowski, S. J. Piestrak, J. Biernat, and A. Shrivastava, "Fast and energy-efficient constant-coefficient FIR filters using residue number system," in Proc. Int. Symp. Low Power Electronics and Design (ISLPED), 2011, pp. 385-390.
[18] S. Ma, J. H. Hu, L. Zhang, and L. Xiang, "An efficient RNS parity checker for moduli set $\left\{2^{\mathrm{n}}-1,2^{\mathrm{n}}+1,2^{2 \mathrm{n}}+1\right\}$ and its applications,"Sci. in China, Ser. F: Inform. Sci., vol. 51, no. 10, pp. 1563-1571, Oct. 2008.
[19] G. Jaberipur and S. Nejati, "Balanced minimal latency RNS addition for moduli set $\left\{2^{\text {n }}-1,2^{\text {n }}, 2^{2 n}+1\right\}$,"" in Proc. $18^{\text {th }}$ Int. Conf. Systems, Signals and Image Processing (IWSSIP), 2011, pp. 1-7.
[20] H. T. Vergos and C. Efstathiou, "A unifying approach for weighted and diminished-1 modulo addition," IEEE Trans. Circuits Syst. II,Exp Briefs, vol. 55, no. 10, pp. 1041-1045, Oct. 2008.
[21] H. Vergos, "On the design of efficient modular adders," J. Circuits, Syst., and Comput., vol. 14, no. 5, pp. 965-972, Oct. 2005.
[22] R. Zimmermann, "Binary Adder Architectures for Cell-Based VLSI and their Synthesis," Ph.D. dissertation, Integrated Syst. Lab., Swiss Federal Inst. of Technol., Zurich, 1997.

